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Abstract — Huffman coding finds an optimal prefix code for a 
given probability mass function. Consider situations in which 
one wishes to find an optimal code with the restriction that 
all codewords have lengths that lie in a user-specified set of 
lengths (or, equivalently, no codewords have lengths that lie in 
a complementary set). This paper introduces a polynomial-time 
dynamic programming algorithm that finds optimal codes for 
this reserved-length prefix coding problem. This has applications 
to quickly encoding and decoding lossless codes. In addition, one 
modification of the approach solves any quasiarithmetic prefix 
coding problem, while another finds optimal codes restricted to 
the set of codes with g codeword lengths for user-specified g (e.g., 



2). 



I. Introduction 



A source emits symbols drawn from the alphabet X = 
{1,2, ...,n}. Symbol i has probability pi, thus defining 
probability mass function vector p. We assume without loss 
of generality that pi > for every i € X, and that pi < pj 
for every i > j {i,j € X). The source symbols are coded into 
binary codewords. The codeword Cj corresponding to symbol 
i has length ij, thus defining length vector I. 

It is well known that Huffman coding [1] yields a prefix 
code minimizing 

given the natural coding constraints: the integer constraint, U 6 
Z+, and the Kraft (McMillan) inequality [2]: 



< i. 



(i) 



iex 



Since an exchange argument (e.g., [3, pp. 124-125]) easily 
shows that an optimal code exists which has monotonic non- 
decreasing lengths, we can assume without loss of generality 
that such minimum-redundancy codes have U > lj for every 
i >j G X). 

There has been much work on solving this problem with 
other objectives and/or additional constraints [4]. One espe- 
cially useful constraint [5], [6] is that of length-limited coding, 
in which 

k £ {l,2,...,Z max }Vi 



the reserved-length constraint: 

U e A = {Ai,A 2 ,. 



, A| A |}Vi 



for Xj 6 Z + Vi. In this case, instead of restricting the range of 
codeword lengths to an interval as in length-limited coding, it 
is restricted to an arbitrary set of lengths. (As demonstrated 
in the next section, there is no loss of generality in assuming 
this set to be finite). The problem is well-formed if and only 
if A| A | > log 2 n. 

This problem was proposed in the 1980s but, due to the lack 
of a solution, never published [7]. A practical application is 
that of fast data decompression. Perhaps the greatest bottleneck 
in fast Huffman decoding is the determination of codeword 
length from input bits, which can be done using a lookup table, 
a linear search, or a decision tree, depending on the complexity 
of the code involved [5]. The average time taken by a linear 
search or an optimal decision tree increases with the number of 
possible codeword lengths, so limiting the number of possible 
codeword lengths can make decoding faster; if the resulting 
increase in expected codeword length is small or zero, this 
can be an effective way of trading off compression and speed, 
with no compression on one end of the spectrum and optimal 
compression on the other end. 

Consider the optimal prefix code for random variable Z 
drawn from the Zipf distribution with n = 2 12 , that is, 

nz = ' ] = 7ztr" 

which is approximately equal to the distribution of the n most 
common words in the English language [8, p. 89]. This code 
has codewords of 13 different lengths, with an average length 
of about 8.78 bits. If one were to restrict this code to only 
allow codewords of lengths in {5, 9, 14}, the resulting optimal 
restricted code would have an average length of about 9.27 
bits. Although suboptimal, this restricted code would decode 
more quickly than the optimal unrestricted code. 

An 0(n 4 )-time 0(n 3 )-space dynamic programming ap- 
proach, introduced shortly, finds optimal reserved-length bi- 
nary prefix codes. Variants of this algorithm solve a related 
length constraint and any case of the quasiarithmetic coding 
problem introduced by Campbell [9], extending the result of 
[10]. 

II. Preliminaries and Algorithm 

Many prefix coding problems — most notably binary Huff- 
man coding and binary length-limited "Huffman" coding — 
must return an optimal code in which the Kraft inequality (1) 
is satisfied with equality, that is, for which k(1) = 1. For 



nonbinary problems, although the corresponding inequality is 
not always satisfied with equality, a simple modification to 
the problem changes this, causing the inequality to always be 
equal for optimal codes [1], [11]. This is not the case for 
the reserved-length problem. For example, if n = 3 and the 
allowed lengths are 1 and 3, then the optimal code must have 
lengths 1, 3, and 3, resulting in a code for which k(1) = 0.75. 
Moreover, it is not clear how to determine k(1) for the optimal 
code other than to calculate the optimal code itself. The 
Huffman coding and most common length-limited appoaches 
rely on n(l) = 1, so these methods cannot be used to find an 
optimal code here. 

The Kraft inequality is often explained in terms of a coding 
tree. A binary coding tree is a rooted binary tree in which the 
leaves represent items to be coded. Along the path to a leaf, 
if the jth edge goes to the leftmost child, the jth bit of the 
codeword is a 0; otherwise, it is a 1. For a finite code tree, the 
Kraft inequality is an equality if and only if every node has 
or 2 children, that is, if it is full. This assumption needs to be 
relaxed for finding an optimal reserved-length prefix code. 

One approach that does not require k(1) = 1 is dynamic 
programming. Many prefix coding solutions use dynamic 
programming techniques [4], e.g., finding optimal codes for 
which all codewords end with a '1' bit [12], a situation in 
which, necessarily, a finite code cannot have k(1) = 1. For the 
current problem, the dynamic programming algorithm should 
find, for increasing tree heights, a set of candidate trees from 
which to choose, and it should terminate when the longest 
feasible length is encountered. First, however, we have to find 
this longest feasible length, since we didn't specify that A, 
the set of allowed lengths, needed to be upper-bounded by 
any function of n or even finite. 

Theorem 1: Any codeword U of an optimal reserved-length 
code either satisfies li < n — 2 or U = Aqo, where Aoo is the 
smallest element of A that satisfies > n — 2. 

Proof: We first show that no partial Kraft sum of x items 



k(1, x) 



can be in the open interval (1 — 2~ x , 1), and, furthermore, if 
the longest codeword is of length l x > x — 1, the sum cannot 
be in (1 — 2~ x+1 + 2~ l * '■, 1). This is shown by induction on 
codeword lengths of nondecreasing order. Clearly 



n(l,2\ 



Q — ll 



(3/4,1) 



satisfies this. Suppose the Kraft sum for x — 1 items cannot 
fall in (1 — 2~ x+1 , 1), that is, for any code for which n(l, x — 
1) < 1, k(1,x - 1) < 1 — 2~ x+1 . Since the xth term is a 
power of two, the partial sum of a code is no greater than 
1 - 2- x+1 + 2- x = 1 - 2- x for k(1, x) < 1. Moreover, if 
l x > a;, the partial sum is less than or equal to l—2~ x+1 +2~ lm . 

Now suppose there is an optimal code for n items which 
includes codeword lengths and l v , where n — 2 < 1^ < 
l u . Assume without loss of generality that and l u are the 
longest codeword lengths and Z„ = l n (i.e., Z„ is the longest 



codeword length). Note that l v > n and the Kraft sum cannot 
equal 1 for any code in which the longest codeword has length 
equal to or exceeding n\ it is well known that the deepest full 
tree is a terminated unary tree, one with depth n — 1. Thus 
k(Z) < 1 — 2 — n . Consider a code with lengths l\ = li for 
i < n and l' n = l^. We show that a prefix code exists with 
these lengths and thus achieves greater compression, rendering 
I suboptimal. If l v = + 1, then 



k(I') = n(l) 



2~ l » < 1 



2 -^-i < i 



since n < 1^ + 1. Otherwise, l n = l v > n, and 

k(1') = k(1) - 2- 1 - + 2-^' < 1 - 2-' n+1 + 2-^' < 1 

since n — 1 < i^. ■ 
Since an optimal tree exists which has monotonic non- 
decreasing lengths, optimal codeword lengths can be fully 
specified by the number of leaves on each of the allowed levels 
of the code tree. For such an optimal tree, given any "allowed 
level" A rn , the lengths with li < X m have a partial Kraft sum 

K A m (0 = K(l,V m ) 

for v m such that l Vm < X m and either h+ Vm < A 
This Kraft sum is a multiple of 2~ Am , so there exists an r] m 
such that k(1, v m ) = 1 — i] m 2~ Xm , and this rj m is the number 
of internal nodes on level A m of any coding tree corresponding 
to the codeword lengths. 

In an optimal coding tree, if A m is defined to be A m+ i 
then, for any v m < n, 

2 A m _ (2 A 



-A, 
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2) 



< 



(2) 



internal nodes next minus 
single-node expansion factor 



leaves under m 



This can be seen by observing that, if a code violates this, we 
can produce a code with the same lengths for l\ through l Vm 
and assign l Vm +i = A, n and U = \ m +i for i > v m + 1, 
and the new code would have no length exceeding that of the 
original code; in fact, l Vm +i is strictly shorter, so the original 
code could not be optimal. For A„ l+ i = A,„ + 1, this condition 
is identical to 

2rj m <n-v m (3) 

which is a looser necessary condition for optimality. For 
similar reasons, no optimal tree will have a partial tree with 
Vm = n — 1 for any to, since using an internal node on level 
A m for the final item results in an improved tree. 

Such properties can be used to construct a dynamic pro- 
gramming algorithm. In describing this algorithm, we use the 
following notation (with mnemonics in boldface): 

v m : Used up leaves at or above level A m 

r\ m : Nodes internal at level A m 

T[to, v, rj\ : Leaves above level A m 

L[m, v, rj] : Y%=i V%U + A m Y?i=i+ Vm Pi 

The idea for the algorithm is to calculate the optimal 
L[m, v mi ?y rn ] given feasible values of partial trees (v m < 



n — 1) and to separately keep track of the best finished tree 
(v m = n) as the algorithm progresses. The trees grow by 
level (A m for increasing m), while the algorithm calculates 
all feasible values of v m (which are in [0, n — 2] for partial 
trees) and r\ m (which are in [0, [ n /2j] for partial trees due to 
(3); if r\ m > n/2, at least one node on a lower level could be 
shortened to length A m , resulting in a strictly improved code). 
Thus there are 0{n 2 ) values per level, and we can try all 
feasible combinations, calculating L for all combinations of 
partial trees — saving optimal combinations — and finished 
trees — saving only the best finished tree encountered up to 
this point. Clearly, v m must be nondecreasing. This, along 
with the bounds on r\ m , are used to try the aforementioned 
combinations. In cases where |A| is much smaller than n, 
additional constraints can be made, based on (2), but such 
constraints do not improve computational complexity in the 
general case, so we do not discuss them here. 

After finishing level Am, the optimal tree is rebuilt via 
backtracking. Assuming arithmetic operations are constant- 
time, complexity of the dynamic programming Algorithm 1 is 
0(|A|n 3 )-time and 0(\ A|n 2 )-space. Because |A| < n without 
loss of generality, if we assume arithmetic operations are 
constant time, time complexity should be 0(n A ) and space 
complexity 0(n 3 ). 

A simple example of this algorithm at work is in finding an 
optimal code for Benford's law [13], [14] with the restriction 
that all codeword lengths must be powers of two. In this case, 
Pi is log 10 (i + 1) — log 10 (i) for i from 1 to n = \X\ = 9, and 
A = {1, 2, 4, 8} is a sufficient range of lengths to allow, due 
to Theorem 1. The calculated values for each feasible partial 
L[m, v, rj\ are shown in Table I. 

On the first level, Ai, average length is identical to the level 
number, and, if, for example, Ai = 1, the nodes at the level 
can include zero ((v, 77) = (0, 2)), one ((1, 1)), or two ((2, 0)) 
terminating nodes, which are the only nontrivial entries in a 
two-dimensional grid for this level, as indicated by the first 
grid in Table I. From each nontrivial entry in the level Ai 
grid, all allowed combinations of terminating and expanding 
are considered until the second (level A2) grid is arrived at, 
and the algorithm proceeds similarly until all allowed levels 
are accounted for. All trees with v = n (all leaves accounted 
for) are compared with the best one so far in order to find an 
optimal tree. In the Benford's law example, this is a tree with 
two codewords of length two and seven codewords of length 
four. Note that the strict inequality of line 29 means that, if 
there are multiple optimal length vectors, the algorithm selects 
one of minimal maximum length. 

Note that a similar approach could be used for nonbinary 
trees, although an efficient exponentiation procedure should be 
used in place of shifting in lines 15 and 47 of Algorithm 1. 
Codeword construction changes (lines with "c/') and the 
aforementioned expansion bounds (lines with "1^1/2") also 
need adjustment for nonbinary cases. These alternations do 
not worsen computational complexity. 



III. Extensions and Conclusion 

The aforementioned method yields a prefix code minimizing 
expected length for a known finite probability mass function 
under the given constraints. However, there are many varied 
instances in which expected length is not the proper value to 
minimize [4]. Many such problems are in a certain family 
of generalizations of the Huffman problem introduced by 
Campbell in [15]. 

While Huffman coding minimizes Y^iex Pih' Campbell's 
quasiarithmetic formulation adds a continuous (strictly) mono- 
tonic increasing cost function (f(l) : R+ — > R+. The value to 
minimize is then 

L(p,l,ip) = ip- 1 ~J2pi<p(li) j . 

\iex J 

Convex cp have been solved for [10]. For nonconvex func- 
tions, it suffices to replace line 16 in the algorithm, 

L[i - l,v,rj\ + (A' - A")(l - F v ) 

with 

L[i-l,v,Tj\ + (<p(\')-<p{\")){l-F v ). 

The exchange argument still holds, resulting in a monotonic 
solution, and A still has cardinality less than n, so the 
algorithm proceeds similarly for identical reasons, and thus 
with the same complexity. A nonbinary coding extension is 
similar to that used to minimize expected length. 

We earlier stated that one purpose for reserving lengths is to 
allow faster decoding by having fewer codewords. However, 
if this is the objective, the problem remains of how to select 
the codeword lengths to use. We might, for example, restrict 
our solution to having two codeword lengths, but not put any 
restrictions on what these codeword lengths should be. Such a 
problem was examined analytically in [16] for n approaching 
infinity. Here, we consider solving the problem for fixed n. 

One approach to the two-length problem would be to try all 
feasible combinations of codeword lengths. We then have to 
find a feasible set, hopefully one relatively small so as not to 
drastically increase the complexity of the problem. 

First note that, if only one codeword length is used, then 
A2 = Ai = [log 2 n]. Otherwise, we begin by observing that, 
for the best tree, the number of internal nodes and leaves on 
the first allowed level Ai must each be greater than (or else 
only one codeword length could be used) and combined be 
no greater than n — 1 (or else a better code exists with all 
codewords having one length). Thus Ai < log 2 (n — 1), or, put 
another way, Ai < [~log 2 n] — 1. At the same time, the second 
allowed level cannot have 2n — 2 or more combined internal 
nodes and leaves; otherwise an improved tree can be found by 
decreasing A 2 by one, since no more than n — 1 leaves can be 
on this level. Because these nodes are all descendants of all 
least one internal node on the first allowed level, this results 
in 2 A2 ~ Al < 2n - 2, which leads to A 2 - Ai < [log 2 (n - 
1)1 < riog 2 7T,] . Combining these results, we find that A 2 < 
2pog a nl -1. 



Algorithm 1 Dynamic programming algorithm for reserved-length prefix coding 
Require: p of size n = \X\, A for which (without loss of generality) Aui-i < \X\ — 2 

1: F <- 

2: for i <- 1, do 

3: i 7 "; *— + pi {Calculate cumulative distribution function} 

4: end for 

5: for all < m < |A|, < v < \X\ - 2, < ?/ < [\X\/2\ do 

6: L[m, u, rj\ <— oo {Initialize partial tree costs} 

7: end for 

8: Anin oo {Best total tree cost so far} 

9: £[0,0, 1] «- {Trivial tree cost} 

10: A" <— {Previous level} 

11: for to <— 1, |A| {Level by level} do 

12: A' <— A m {Current level} 

13: for all (v, 77) S [0, 1^1 — 2] x [0, LI^|/2J] {Find optimal partial trees with given m from (m — 1, u, 77)} do 

14: if L[m — 1, v, 77] < 00 then 

15: rf <— 77 -C (A' — A") {Total nodes on new level A m } 

16: L' «- £[m - 1, u, 77] + (A' - A")(l - F v ) {Cost on new level A r „} 

17: if to < |A| {Build partial trees (for which to < |A|)} then 

18: v m [ n <— max(w, 2(i> + rj') — \X\) {Range of potential v m } 

19: v max *-imn{v + T/,\X\-2) 

20: for v' <— t^ m i n , u max {Compare cost for all potential v m < \X\} do 

21: if L[m, v', if - v' + v] > V then 

22: L[m, v' , rj' — v' + v] <— L' {New optimal partial cost for (m, v m rj m ) = (to, 1/, rj' — v' + v)} 

23: T[m, w', 77' — u' + v] «— t> {Save with u TO -i for backtracking} 

24: end if 

25: end for 

26: end if 

27: end if 

28: if \X\ < v + rf then 

29: if L' < £ m in {Find best finished tree} then 

30: imin *~ L 1 {Best finished tree cost} 

31: (m min , u min , 77 min , Xmin) <- (m, | Af[, 77' - +u, u) {Save optimal values with Xmin = v m -i for backtracking} 

32: end if 

33: end if 

34: end for 

35: A" <— A' {Current level now previous level} 

36: end for 

37: (m, v, i], x) <- ("T-min, Wmin, »7min, Xmin) {Backtrack to find optimal tree} 

38: c <— (1 <c A m ) — 77 {1 greater than integer representation of final codeword} 

39: while m > 1 {Rebuild best tree} do 

40: if v < \X\ then 

41: X ^ v ~ T[to, t>, 77] {Number of leaves above level} 

42: end if 

43: for j <— v down to u — x + 1 do 

44: (lj,Cj) (A m , {c + j — v — l}x m ) {Assign lengths/codewords (where {x} y denotes the y-bit representation of x)} 

45: end for 

46: c <— c^> (A TO — A m _i) {Start codewords of length A m _i} 

47: (to, u, 77) <— (m — 1, ti — X7 + x) ^ (Am — A m _i)) {Calculate new (to, u, 77) from old using x} 

48: end while 

49: for j ' v down to 1 do 

50: (Ij, Cj) <— (Ai, {j — l}Ai) {Shortest lengths/codewords} 

51: end for 



PBenford ~ {0.301, 0.176, 0.125, 0.097, 0.079, 0.067, 0.058, 0.051, 0.046}, A = (1, 2, 4, 8) 

Level Ai = 1 (m = 1) 
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Level A 2 = 2 (m = 2) 
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Level A 3 = 4 (m = 3) 
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TABLE I 

Grids for finding reserved-length solution via dynamic programming. Each value represents optimal partial cost for a given 77, 
V, and level A, with number of leaves above the given level given in parentheses. The partial trees used in the optimal result — 
that terminated with (m, v, if) = (3, 9, 2) having I = {2, 2, 4, 4, 4, 4, 4, 4, 4} — are shown in boldface. 



This result, while not the strictest bound possible, is suffi- 
cient for us to determine that the number of codeword length 
combinations one would have to try would be 0(log 2 n). 
Thus, since |A| = 2 in all cases and only 0(1) data need 
be kept between combinations, the algorithm has only an 
0(n 2 ) space and an 0(n 3 log 2 n) time requirement, smaller 
than even the general version of the reserved length problem. 
For example, the optimal two-length code for the Benford 
distribution has two codewords of length two and seven 
codewords of length four. This is the code found above to 
be optimal for lengths restricted to powers of two. This two- 
length code has average codeword length 3.04 . . ., very near 
to that of the optimal unrestricted Huffman code, which has 
average codeword length 2.92 .. .. 

The two-length problem's solution can be easily generalized 
to that of a g-length problem, which can be optimally solved 
with 0(n 2 g) space and 0(n 3 (log 9 n)g 3 ) time in similar 
fashion. In fact, all (/-length problems, for g' < g, can be 
solved with this complexity, allowing for a selection of the 
desired trade-off between number of codeword lengths (speed) 
and expected codeword length (compression efficiency). Mod- 
ifications can enact additional restrictions on codeword lengths 
(e.g., a limit on maximum length) in a straightforward fashion. 

We thus find that this dynamic programming method is 
quite general, solving three problems that previously had no 
proposed polynomial-time solutions: the reserved-length prob- 
lem, Campbell's quasiarithmetic problem, and the g-length 
problem. 



References 

[1] D. A. Huffman, "A method for the construction of minimum-redundancy 

codes" Proc. IRE, vol. 40, no. 9, pp. 1098-1101, Sept. 1952. 
[2] B. McMillan, "Two inequalities implied by unique decipherability," IRE 

Trans. Inf. Theory, vol. IT-2, no. 4, pp. 115-116, Dec. 1956. 
[3] T. Cover and J. Thomas, Elements of Information Theory, 2nd ed. New 

York, NY: Wiley-Interscience, 2006. 
[4] J. Abrahams, "Code and parse trees for lossless source encoding," 

Communications in Information and Systems, vol. 1, no. 2, pp. 113- 

146, Apr. 2001. 

[5] A. Moffat and A. Turpin, "On the implementation of minimum redun- 
dancy prefix codes," IEEE Trans. Commun., vol. 45, no. 10, pp. 1200- 
1207, Oct. 1997. 

[6] I. H. Witten, A. Moffat, and T. Bell, Managing Gigabytes, 2nd ed. San 

Francisco, CA: Morgan Kaufmann Publishers, 1999. 
[7] Z. Zhang, Private communication, Feb. 2005. 

[8] G. K. Zipf, "Relative frequency as a determinant of phonetic change," 

Harvard Studies in Classical Philology, vol. 40, pp. 1-95, 1929. 
[9] L. L. Campbell, "Block coding and Renyi's entropy," Int. J. Math. Stat. 

Set, vol. 6, no. 1, pp. 41-47, June 1997. 
[10] M. B. Baer, "Source coding for quasiarithmetic penalties," IEEE Trans. 

Inf. Theory, vol. IT-52, no. 10, pp. 4380^393, Oct. 2006. 
[11] , "D-ary bounded-length Huffman coding," in Proc, 2007 IEEE 

Int. Symp. on Information Theory, June 24-29, 2007, pp. 896-900. 
[12] S.-L. Chan and M. J. Colin, "A dynamic programming algorithm for 

constructing optimal "l"-ended binary prefix-free codes," IEEE Trans. 

Inf. Theory, vol. IT-46, no. 4, pp. 1637-1644, July 2000. 
[13] S. Newcomb, "Note on the frequency of use of the different digits in 

natural numbers," Amer. J. Math., vol. 4, no. 1/4, pp. 39-40, 1881. 
[14] F. Benford, "The law of anomalous numbers," Proc. Amer. Phil. Soc, 

vol. 78, no. 4, pp. 551-572, Mar. 1938. 
[15] L. L. Campbell, "Definition of entropy by means of a coding problem," 

Z. Wahrscheinlichkeitstheorie und verwandte Gebiete, vol. 6, pp. 113— 

118, 1966. 

[16] E. Figueroa and C. Houdre, "On the asymptotic redundancy of lossless 
block coding with two codeword lengths," IEEE Trans. Inf. Theory, vol. 
IT-51, no. 2, pp. 688-692, Feb. 2005. 



